fix: suffixes_prefixes_titles always reflects current set state#166
fix: suffixes_prefixes_titles always reflects current set state#166gaoflow wants to merge 5 commits into
Conversation
The `suffixes_prefixes_titles` property on `Constants` cached its result in `_pst` after the first access. Any subsequent `add()` or `remove()` call on `titles`, `prefixes`, `suffix_acronyms`, or `suffix_not_acronyms` was silently ignored by the cache, so `is_rootname()` kept returning the stale result. Concretely, a word added to `C.titles` after the property was first accessed would still be treated as a rootname (first/middle/last) by `join_on_conjunctions`, even though `is_title()` correctly returned `True` for it. This contradicts the documented behaviour of per-instance config customisation described in AGENTS.md. Fix: drop the `_pst` cache entirely and compute the union fresh on every access. The four-set union is cheap and the simplest correct approach. Add five tests that assert the property and `is_rootname` stay consistent with the live sets after `add()`/`remove()` calls.
…_setattr__ The original cache was dropped to fix staleness, but recomputing the set union on every call is ~1000x slower. This restores the cache with a correct invalidation strategy: SetManager fires an on_change callback after add/remove, and Constants.__setattr__ clears _pst and re-wires the callback whenever one of the four contributing attrs is replaced (covering both user mutations and conftest teardown restores). The conftest manual `CONSTANTS._pst = None` reset is removed — it is now handled automatically by __setattr__ during collection restore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lper Add missing test cases flagged during PR review: - prime-then-mutate test that would catch a regression to cached staleness - add tests for suffix_acronyms and suffix_not_acronyms (previously untested) - remove test for prefixes (mirroring the existing titles remove test) - assertFalse(x in ...) -> assertNotIn for the two remove tests Also adds assertNotIn to HumanNameTestBase to support the new assertions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Two follow-up commits added based on review feedback:
The original fix dropped the cache entirely, which is ~1000x slower per call (~69 µs vs ~70 ns). This commit restores it with correct invalidation:
690 tests pass (682 pre-existing + 8 new × dual-run fixture). |
Six of the suffixes_prefixes_titles tests passed against the unfixed code because they read the property only once on a cold cache, so they asserted "the union includes its source sets" rather than "the cache stays consistent after a mutation." Prime the cache before mutating so they actually exercise invalidation; verified they now fail (14 across the dual-run fixture) against master's pre-fix config. Change the property guard from `if not self._pst` to `if self._pst is None` so a legitimately empty union caches instead of recomputing on every access — and so "cached" means "computed," which is the state the primed tests rely on. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… leak Replace the catch-all Constants.__setattr__ + _PST_ATTRS frozenset with a _CachedUnionMember descriptor on the four cached attributes (prefixes, suffix_acronyms, suffix_not_acronyms, titles). The invalidation/rewiring behavior now lives beside the attributes it governs, runs only for those four (not on every attribute assignment), and removes the hand-synced attribute-name list. Fixes a stale-callback leak: reassigning a manager now detaches the replaced one (_on_change = None) so an orphaned reference can no longer invalidate the live cache. Drop the unused on_change constructor parameter from SetManager — wiring is always done by the owning Constants after construction (e.g. the config-teardown setattr path), so the constructor channel was dead and misleading. Behavior, mypy, and ruff are unchanged; parse throughput is identical to the cached baseline (~18us/name vs ~263us/name without the cache). Tests: add coverage for the replaced-manager rewiring path, the orphan-no-longer-invalidates guarantee, add_with_encoding invalidation, and suffix-set removal after priming; add an assertIs shim to the test base. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
A follow-up on the cache-invalidation machinery added in
The fix moves the behavior onto a small
The in-place Behavior, New tests cover the paths the previous version left unguarded: the replaced-manager re-wiring path, the "orphaned manager no longer invalidates the live cache" guarantee (asserted by cache identity, since the leak's only symptom is an unnecessary recompute), |
Problem
Constants.suffixes_prefixes_titlescached its result in_pstafter thefirst access and never invalidated that cache. Any subsequent
add()orremove()call ontitles,prefixes,suffix_acronyms, orsuffix_not_acronymswas silently ignored by the stale cache.This creates an observable inconsistency:
is_title()/is_prefix()/is_suffix()query the live sets and return the correct answer, butis_rootname()— which delegates tosuffixes_prefixes_titles— keepsreturning the stale cached answer, causing it to contradict the other helpers.
Minimal reproduction:
The same inconsistency affects titles and suffixes added or removed after the
property is first read.
join_on_conjunctionsrelies onis_rootnametocount
rootname_piecesand choose whether to join single-letter conjunctions,so a stale
_pstcan silently skew that decision.Fix
Drop
_pstentirely and compute the set union fresh on every access. Theunion of four
SetManagerinstances is a simpleO(n)operation with a smallconstant and is called only during name parsing, so there is no meaningful
performance cost.
Tests
Five new tests in
tests/test_constants.pyverify that:suffixes_prefixes_titlesreflects titles and prefixes added afterconstruction.
suffixes_prefixes_titlesno longer contains a title that was removed.is_rootnameandis_title/is_prefixremain consistent afteradd()/remove()calls.All 682 tests pass (672 pre-existing + 10 from the dual-run fixture applied
to the 5 new test methods).
This pull request was prepared with the assistance of AI, under my direction and review.